-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46990: Add configuration for embargo SIA #26
Conversation
Add an initial template for an SIA configuration to use with the embargo repo at USDF.
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found. Additional details and impacted files@@ Coverage Diff @@
## main #26 +/- ##
=======================================
Coverage 82.29% 82.29%
=======================================
Files 18 18
Lines 1079 1079
Branches 174 174
=======================================
Hits 888 888
Misses 166 166
Partials 25 25 ☔ View full report in Codecov by Sentry. |
configs/embargo-sia.yaml
Outdated
@@ -0,0 +1,71 @@ | |||
facility_name: Rubin-LSST | |||
obs_collection: LSST.Embargo | |||
collections: [<!!!! JIM PLEASE FILL IN LIST OF COLLECTIONS CONTAINING IMAGES OF INTEREST !!!!>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TallJimbo I think we just need two things from you:
- A list of image dataset types from the embargo repo that will be of interest to Rubin staff
- A list of collections containing those images
The interface they will be accessed through is similar to this one:
This older config for LATISS images may be useful inspiration: https://github.com/lsst-dm/dax_obscore/blob/main/configs/usdf-embargo-live.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If you just give me the dataset type names I can work with Gregory to figure out the rest of the config associated with them.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dataset type names: raw
, postISRCCD
, calexp
, pvi
, deepCoadd
, deepCoadd_calexp
, goodSeeingCoadd
, goodSeeingDiff_differenceExp
Note that this is definitely a list intended for a staff RSP; science users will not see all of these dataset types, as some of them are only useful when something has gone wrong and you want to see where it happened. It also does not include calibration frames; if there's a desire to include those, I'll have to get someone else to give you a list.
Collections:
- Source of truth is https://rubinobs.atlassian.net/wiki/spaces/DM/pages/48834013/Campaigns
- I think we're interested in the prompt processing, nightly-validation, and intermittent DRP campaigns here.
- This resolves to the following collection names:
LSSTComCam/prompt/output-<day_obs>
,LSSTComCam/nightlyValidation
,LSSTComCam/runs/DRP/<data-date-range>/w_2024_XX/DM-XXXXX
. - We might want to publish the set of daily
LSSTComCam/runs/nightlyValidation/{day_obs}/<lsst_distrib_tag>/DM-XXXXX
collections instead of the umbrellaLSSTComCam/nightlyValidation
collection to keep butler queries simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which aspect of the queries are you referring to when you say "keep butler queries simpler?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The umbrella collection is a CHAINED collection with one member for each day, and the same dataset types in each member (roughly). So the collection summaries will help ot much in collapsing queries down to fewer actual RUN collections.
It looks like the SIA code is not currently set up to handle these -- it's not querying the globs at all, and even if it was it is doing a find-first search which is not compatible with glob searches.
There's a bit of an impedance mismatch between the way the The SIA query is using a Perhaps better would be if the Butler collections were exposed as ObsCore collections, so that the caller could specify which collections they wanted to search in. But the code isn't set up to do that right now -- the ObsCore collection name is just a static string defined in the config file. We might consider creating some chained collections in the embargo repo to serve as explicit top-level collections for SIA searches, which would make things easier to administer. In any case the Tweaking the SIA code itself is a bigger scope than I want to jump in and help with right now, so getting access to the other collections might need to wait until Tim is back. |
Oh, ok, if the collections here are all queried together in the end anyway, then I'd recommend using the most recent DRP collection as the only one until we can rework things as you've described, as that runs a superset of the tasks in nightly-validation with consistent code and configuration, and now that we're not on-sky anymore there's no real value to being able to automatically pick up new processing from prompt or nightly-validation. |
OK, that should get us a little further. The existing SIA code has the ability to override the collection config on the fly, so Stelios can add some code to the SIA service to locate and use the latest DRP collection. |
Or I guess we could ask the campaign team to maintain a chained collection with the latest DRP as the only child. |
cc/ @stvoutsin |
Just to make sure I understand what this means:
(Don't know if this actually works)
Where is the DRP collection path/string configured? Do I parse this from the obscore_config you will be publishing here, or is it something that needs to be part of the application configuration? |
It would be something approximately like that, yeah. It would have to be a hack that's part of the application configuration. But hopefully you don't have to do anything -- after no-Slack-day ends I'm going to ask Campaign Management to maintain a collection for this, which is easier for you and will give us a way to keep failed and in-progress pipeline runs out of SIA. With the code in this PR as-is, you should have enough to start working on getting the service set up at USDF. |
since you're looking at this PR right now: I think the configuration will be pretty much identical to IDF, except the Butler repository is called |
Ok sounds good, a preliminary branch with the phalanx/sia configuration for this is here: |
Campaign Management has agreed to manage a collection |
I am going to merge this so that it goes into tonight's build of |
Add a configuration to use for the SIA service pointing to the embargo repository at USDF.